Hands-on Exercise 3_2: Programming Animated Statistical Graphics with R

Author

Wei Yanrui

Published

January 28, 2024

Modified

January 30, 2024

1. Overview of animation

1.1 Concepts

The workflow of animations:

Dataset -> Many Data subsets by animated variable(i.e. time horizons) -> Create many individual plots by subsets -> Render motion by stitching the plots into frames and displaying them sequentially over time

1.2 Terminology

  • Frame: In an animated line graph, each frame represents a different point in time or a different category. When the frame changes, the data points on the graph are updated to reflect the new data

  • Animation Attributes: The animation attributes are the settings that control how the animation behaves.

2. Getting Started

2.1 Install R Packages

In this hand-on exercise, 5 R packages will be used, they are:

  • plotly: for plotting interactive statistical graphs

  • gganimate: an ggplot extension for creating animated statistical graphs

  • gifski: converts video frames to GIF animations

  • gapminder: An excerpt of the data available at Gapminder.org

  • tidyverse: designed to support data science, analysis and communication task including creating static statistical graphs

pacman::p_load(readxl, gifski, gapminder, plotly, gganimate, tidyverse)
Code Notes
  1. readxl: is used to read Microsoft Excel file (.xls and .xlsx), and convert it into dataframe in R

2.2 Import data

col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls",
                      sheet = "Data") %>%
  mutate_at(col, as.factor) %>%
  mutate(Year = as.integer(Year))

The “Data” sheet shows the population for specific year in some countries with the percentage of Young and Old.

Code Notes
  1. col <- c("Country","Continent"): create a string vector, including 2 string elements “Country” and “Continent”

  2. read_xls("file path",sheet="sheet name"): read excel file from specific file path, and choose specific sheet inside this file, assign it to a dataframe called “globalPop”

  3. %>%: pipe operator. A %>% B: pass the result from process A to B for further processing in B

  4. mutate:

    4.1 mutate_at() transform data of specified column,mutate_at()=mutate(across())

    4.2 mutate() modify current columns or create new columns

  5. as.factor: convert the data type of one column into factor

  6. as.integer(): convert the data type of one column into integer

3. Animated Data Visualization: gganimate methods

3.1 Functions inside gganimate

  • transition_*(): defines how the data should be spread out and how it relates to itself across time

  • view_*(): defines how the positional scales should change along the animation

  • shadow_*(): defines how data from other points in time should be presented in the given point in time

  • enter_*()/exit_*(): defines how data from other points in time should be presented in the given point in time

  • ease_aes(): defines how different aesthetics should be eased during transitions

3.2 Build a static population bubble plot

Use basic ggplot2 functions to create a static bubble plot.

ggplot(globalPop, aes(x=Old, y=Young,
                      size=Population,
                      colour=Country))+
  geom_point(alpha=0.7,
             show.legend=FALSE)+
  scale_colour_manual(values=country_colors)+
  scale_size(range = c(2,12))+
  labs(title = 'Year: {frame_time}',
       x='% Aged',
       y='% Young')

Code Notes
  1. show.legend=FALSE: it’s being used inside geom_point() to hide the legend of points. If you want to hide the legend of the whole graph, apply theme(legend.position="none") to ggplot() object.

  2. scale_colour_manual(): defines color mapping which allows you assign a specific color to each country.

    2.1 values accepts a color vector including colors reflecting to each country.

    2.2 country_colors: built-in gapminder color schemes for the countries and continents in the Gapminder data

  3. scale_size(): here defines the size range of points

  4. aes(size=Population): defines the size of each point in the scatter plot according to the population of this data point

  5. range = c(2,12): size from 2 to 12, including 2 and 12

3.3 Build the animated bubble plot

Create animated bubble plot by using:

  • transition time(): to create transition through distinct states in time (i.e. Year)

  • ease_aes(): to control easing of aesthetics. The default is linear. Other methods are: quadratic, cubic, quartic, quintic, sine, circular, exponential, elastic, back, and bounce.

ggplot(globalPop, aes(x=Old,y=Young,
                      size=Population,
                      colour=Country))+
  geom_point(alpha=0.7,
             show.legend = FALSE)+
  scale_colour_manual(values = country_colors)+
  scale_size(range=c(2,12))+
  labs(title="Year:{frame_time}",
       x="% Aged",
       y="% Young")+
  transition_time(Year)+
  ease_aes("linear")

4. Animated Data Visualization: plotly

  • frame: defines how the data should be spread out and how it relates to itself across time

  • ids: ensure smooth transitions between objects with the same id (which helps facilitate object constancy)

4.1 Build an animated bubble plot with ggplotly()

gg <- ggplot(globalPop,
             aes(x=Old,
                 y=Young,
                 size=Population,
                 colour=Country))+
  geom_point(aes(size=Population,
                 frame=Year),
             alpha=0.7,
             show.legend=FALSE)+
  scale_colour_manual(values=country_colors)+
  scale_size(range=c(2,12))+
  labs(x="% Aged",y="% Yound")

ggplotly(gg)
Code Notes
  1. 1st size=Population: is in the global aes() mapping, which applies this mapping across all layers, meaning that all layers that support size mapping will adjust the size of their elements based on the ‘Population’ variable

  2. 2nd size=Population: is in the local aes() mapping with geom_point(). This is usually unnecessary because if size mapping is already defined in the global aes(), it would automatically apply to all layers unless a different setting is required for a specific layer. However, in some cases, if you were to change the size mapping in subsequent layers, redefining size within geom_point() ensures that the size mapping for the scatter plot is not affected by later changes.

  3. Note that although show.legend = FALSE argument was used, the legend still appears on the plot. To overcome this problem, theme(legend.position='none') should be used as shown in the plot and code chunk below.

gg <- ggplot(globalPop,
             aes(x=Old,
                 y=Young,
                 size=Population,
                 colour=Country))+
  geom_point(aes(size=Population,
                 frame=Year),
             alpha=0.7,
             show.legend=FALSE)+
  scale_colour_manual(values=country_colors)+
  scale_size(range=c(2,12))+
  labs(x="% Aged",y="% Yound")+
  theme(legend.position ="none")

ggplotly(gg)

4.2 Build an animated bubble plot with plot_ly()

bp <- globalPop %>%
  plot_ly(x=~Old,
          y=~Young,
          size=~Population,
          color=~Continent,
          sizes=c(2,100),
          frame=~Year,
          text=~Country,
          hoverinfo="text",
          type="scatter",
          mode="markers") %>%
  layout(showlegend=FALSE)
bp